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ABSTRACT 


The model of 16-bit multiplier having low power and high speed 
using Algorithm named Dadda and the basic building block used is 
optimized Full adder having low power dissipation and minimum 
propagation delay. Full and half adder blocks have been designed 
using pass-transistor logic and CMOS process technology to reduce 
the power dissipation and propagation delay. We have also applied of 
Dadda algorithm to reduce the propagation delay. The model has 


been designed using XILINX. 
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1. INTRODUCTION 

Today, the use of portable electronic gadgets is 
growing every day, and these devices need batteries 
to function. In order to build such gadgets, including 
laptops, mobile phones, tablets, notebooks, and many 
more personal electronic devices, it is crucial to 
consider power dissipation. In VLSI technology, the 
power dissipation plays a crucial function. More 
power dissipation causes circuits to heat up more, 
which reduces battery life and necessitates cooling for 
the circuit. As a result, power dissipation reduces 
battery life and raises the cost of the entire system. 
The majority of the digital electronic devices 
mentioned are employed in DSPs, microcontrollers, 
video and image processing, as well as other 
applications. Addition, multiplication, subtraction, 
division, shifting, rotation, and other operations are 
performed using different arithmetic and logical 
processes. Every embedded CPU design had 
struggled with the extreme need for low power 
dissipation. Power reduction for any system or design 
can be achieved at several design levels, including 
dynamic voltage scaling at the system level, power 
gating and clock gating at the logic level, and 
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transistor sizing and threshold voltage scaling during 
the semiconductor chip design stage. Any processor's 
specific functional part or components can have their 
power consumption reduced. 


Most entirely electronic applications, as well as many 
digital communication applications, use 
multiplication as one of their primary operations. 
When designing an optimal digital circuit, multipliers 
with lower latency, power consumption, and area are 
always employed to ensure that the maximum 
throughput is achieved with the shortest possible 
response time. The fundamental building elements of 
any multiplier design are full adders and half adders. 
To date, various half-adder and full-adder design 
architectures have been developed and put into use in 
order to reduce power consumption, area, and delay 
and produce an effective multiplier circuit. Along 
with this, several methods, like the Dadda algorithm, 
Wallace tree, Booth multiplier, and Vedic algorithms, 
have been developed to achieve optimal power, area, 
and latency. Recently, the Dadda algorithm and 
Reducedsp-D3Lsum (reduced-split pre-charge data 
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driven dynamic sum logic) adder logic approach have 
been applied as multipliers. Even if these designs 
operate at higher frequencies with less power 
dissipation, overall power dissipation must be 
lowered, therefore multipliers will become the main 
building block in larger circuits to accomplish this. 


2. LITERATURE 

The decrease of power dissipation in the design of 
digital systems has been the subject of studies to date. 
In digital systems using CMOS technology, power 
dissipation can take two different forms. Both leakage 
power dissipation due to leakage current and 
switching activity power dissipation, often known as 
dynamic and static power dissipation, occur in 
transistors. Various methods have been used to 
minimize lowering switching frequency, switching 
capacitance, or supply voltage can reduce dynamic 
power dissipation. Similar to how supply voltage can 
be reduced, circuit size can be shrunk, operating 
temperature can be decreased, and transistor threshold 
voltage can be raised to reduce leakage power. 


The majority of embedded CPU designs have serious 
design challenges with regard to power dissipation. 
The processor's Arithmetic and Logic Unit is one of 
its most prevalent and essential components. A 
combinational logic circuit with a greater number of 
functional components for carrying out various 
logical and arithmetic operations is typically used to 
implement ALUs. ALUs can be created using a tree 
or a chain structure. This is simple to predict or 
include into a processor design environment, resulting 
in an effective reduction in overall power dissipation 
for a particular application. The results indicate that a 
maximum 46.9% decrease in ALU power can be 
achieved, with an average power improvement range 
of 43.5% to 49.6%. Pass-transistor logic was used to 
construct a multiplier with an improved full adder 
because it requires fewer transistors and smaller node 
capacitances, which causes less delay and allows for 
faster operation. With various compressors, the 
Dadda multiplier is utilised to improve speed and 
reduce power. Compressors are used in multipliers to 
simultaneously decrease all stages of operation in 
addition to the vertical critical path. Different 
compressors can be used in place of 4:2 compressors 
to increase the Dadda multiplier's speed. In this study, 
compressors with ratios of 4:1, 5:3, 6:3, and 7:3 are 
utilised to cut the number of addition stages in the 
multiplication algorithm by reducing the number of 
half adders and full adders. 


Different full adder architectures are created by 
combining two 2-input MUXs to generate both the 


sum and the carry, two 4-input MUXs to produce the 
sum bit and the carry bit, and two 2-input XOR gates 
to generate the sum and carry. Using pass transistor 
logic, a model of a 4-bit multiplier with fast operation 
and low power consumption was created. 


3. SOFTWARE & DESIGN 

3.1. XILINX ISE 14.7: 

Xilinx is a US technology company, providing 
programmable logic devices in particular. The 
company invented the portal array programmable 
field (FPGA). The company developed the primary 
fabless production model. The semiconductor. Co- 
founded in 1984 in the NASDAQ by Ross Freeman, 
Bernard Vonderschmitt and James V Barnett II. 


In October 2020, AMD announced the acquisition of 
Xilinx. Bureaux were established in the Geographical 
Region in the Geographical Region in 1984 in 
Dublin, Ireland; Hyderabad, China; Shanghai, 
Brisbane, Australia; & Tokyo, Japan. Xilinx also has 
its headquarters in Longmont, USA. 


The name Xilinx referring to the silicon chemical 
symbol Si is selected according to Bill Carter. The 
"X's are logical blocks that can be programmed at 
each end. The "Linx" is a programmable link between 
the logic blocks. 


3.2. POWER OPTIMIZATION 

Energy is the overall number of Joules dissipated by a 
circuit, whereas power is the number of Joules 
dissipated during a specific period of time. The well- 
known power-delay product is frequently used in 
digital CMOS design to judge the qualities of designs. 
This may be demonstrated as power delay = 
(energy/delay) delay = energy, which implies that 
delay is unnecessary. 


3.3. LOW POWER MULTIPLIER DESIGN 
There are three steps to multiplication: partial product 
generation (PPG), partial product reduction (PPR), 
and carry-propagate addition (CPA). There are often 
implementations for consecutive multipliers and 
combinations of multipliers. Because the scale of 
integration is now sufficiently great to allow parallel 
multiplier implementations in digital VLSI systems, 
we solely take into consideration the combinational 
case here. The PPG, PPR, and CPA methods of 
different multiplication algorithms differ from one 
another. Radix-2 is the simplest for PPG. One 
operand is typically recoded into high-radix digit sets 
in order to decrease the amount of PPs and, as a 
result, decrease the area/delay of PP reduction. The 
radix-4 digit set with the values 2, 1, 0, 1, and 2 is the 
most common. 
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4. Block Diagram 
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4.1. DADDA ALGORITHM 

The design and implementation of a low power 16x16 
bit multiplier using the Dadda algorithm and an 
optimized full adder is a complex task that requires a 
thorough understanding of digital design principles 
and optimization techniques. 


The Dadda algorithm is a well-known method for 
performing fast and efficient multiplication of large 
numbers. It is based on a recursive structure that 
breaks down the multiplication process into smaller 
sub-problems, which are then combined to form the 
final result. The key advantage of the Dadda 
algorithm is its low power consumption, which is 
achieved by reducing the number of additions and 
logical operations required to perform the 
multiplication. 


To implement the Dadda algorithm in a 16x16 bit 
multiplier, the first step is to break down the operands 
into smaller sub-problems. This can be achieved by 
using a decomposition technique, such as the Booth 
algorithm, which reduces the number of bits in the 
operands by half. Once the operands have been 
decomposed, the Dadda algorithm can be applied to 
each sub-problem, resulting in a series of partial 
products. These partial products are then combined 
using a modified version of the Dadda algorithm, 
known as the Dadda-tree, which reduces the number 
of additions required to form the final result. 
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FIG 4: BLOCK DAIGRAM OF 16x16 BIT MULTIPLIER 


To further optimize the performance of the multiplier, 
an optimized full adder can be used. A full adder is a 
digital circuit that performs the addition of three 
binary numbers. The optimized full adder is a 
specialized version of the full adder that reduces the 
number of logical operations required to perform the 
addition, resulting in a reduction in power 
consumption. 


In conclusion, the design and implementation of a 
low power 16x16 bit multiplier using the Dadda 
algorithm and an optimized full adder is a complex 
task that requires a thorough understanding of digital 
design principles and optimization techniques. The 
use of the Dadda algorithm and an optimized full 
adder can significantly reduce the power consumption 
of the multiplier, making it suitable for use in low- 
power applications. 


4.2. IMPLEMENTATION OF MULTIPLIER 
An effective technique for bit-level binary number 
multiplication is the Dadda multiplier. Instead of 
doing a conventional full multiplication, it is based on 
the principle of adding together partial products. 


The Dadda multiplier's fundamental steps are as 

follows: 

1. By multiplying each bit of the first number by 
each bit of the second number, you can create a 
partial product matrix. 
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2. To create anew matrix of partial sums, combine 
the rows and columns of the partial product 
matrix. 


3. Repeat step 2 until there is just one element in the 
matrix—the result of the multiplication—left. 


. Adders and logic gates can be used 
combination to create a Dadda multiplier. 


in 


The particular design will depend on the application 
in question and the level of optimization that is 
sought. Because it minimises the amount of adds and 
cary propagation, it is quicker and more effective 
than the conventional method. 


The multiplier was constructed as a linear pipeline to 
make the best use of the processing components. In 
order to prevent any one processing stage from 
creating a "bottleneck," it was crucial to make sure 
that the delay of each stage in the pipeline was about 
comparable. An N by M matrix of partial products is 
produced by multiplying an M-bit multiplicand by an 
N-bit multiplier. By simultaneously applying the (3, 
2) and (2, 2) counters to this partial product matrix, a 
matrix with a height of two is produced. 


Each (3, 2) counter (complete adder) takes three 
inputs from a specific column and outputs a carry bit 
that moves to the subsequent, more significant 
column and a sum bit that stays in the supplied 
column. A (2, 2) counter (half adder) takes two inputs 
from a column and outputs a carry bit in the following 
more significant column and a sum bit in the same 
column. Using a dot diagram, the 16 by 16 Dadda 
multiplier is implemented, as seen in Fig 1. The 
Dadda technique effectively reduces the quantity of 
adder stages needed to achieve the partial products’ 
summing. 


This is accomplished by reducing the number of rows 
in the matrix of bits at each summation stage by a 
factor of 3/2 using full and half adders. As a result, a 
final matrix with two rows of bits must be added 
together using a multiple-bit adder (e.g. a ripple-carry 
or carry look-ahead adder). This scheme's matching 
circuit for a multiplier is displayed. Contrarily, in a 
common multiplication scheme the array, the 
summation moves forward in a more predictable, 
though slower, fashion to arrive at the sum of the 
partial products. With this method, each summation 
stage only eliminates one row of bits from the matrix. 
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Fig 4.2:- DOT DIAGRAM OF PROPOSED 16*16 DADDA MULTIPLIER 
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The following is how Dadda multiplication works: 
Six steps are needed to multiply 16 by 16 in its 
entirety. Partial products are always the first stage, 
and they are created by simply multiplying a 
multiplicand by a multiplier. There are now 16 rows 
(heights) available. Now, further reduce the number 
of rows so that the last stage comprises just two rows. 
To address this, Dadda creates a series of 
intermediate matrix heights that offers the bare 
minimum of reduction steps for a certain size 
multiplier. The height of each intermediate matrix in 
this series, which was selected by working backwards 
from the last two-row matrix, is restricted to the 
greatest integer that is no greater than 1.5 times the 
height of its immediate predecessor. Six reduction 
stages are necessary for the proposed 16x16 Dadda 
multiplier, with intermediate matrix heights of 13, 9, 
6, 4, 3, and finally 2. 


The product's least important bit is represented by the 
single bit in the first column. With the aid of the (3, 2) 
and (2, 2) counters, it is possible to deduce from the 
dot diagram that 2 row stage can be deduced from 3 
row Stage and 3 row stage can be deduced from 4 row 
stage. S is the number of stages needed to implement 
the multiplier, and this is stage (S-1) of that process. 


From the six-row stage, the four-row stage can be 
derived. This is stage (S-2) The 9-row stage can be 
used to deduce the 6-row stage. This could be stage 
(S-3) The 13-row stage can be used to deduce the 9- 
row stage. The 13-row stage can be obtained from the 
partial product stage, which is the (S-4)th stage. In 
order to achieve no more than 13 rows, columns are 
partially decreased when we move from the partial 
products stage to stage 1. 


According to the dot diagram, stage 1 will change 
column 14 (the 14th bit) of partial products into a 13- 
bit column by reproducing 12 bits without 
transformation and only transforming 2 bits by the (2, 
2) counter. Thus, column 15 of the partial products 
stage (15th bit and 14th bit) will be converted into a 
13-bit column in stage 1 by reproducing 12 bits 
without transformation and only altering 2 bits by a 
(3, 2) counter with the aid of the carry generated from 
the preceding column. As a result, only a few 
columns in the middle of the partial products stage 
undergo actual transformation. 


By using the (2, 2) and (3,2) counters, columns with 
no more than 9 bits are obtained as we move from 
stage | to stage 2. Columns with no more than 6, 4, 3, 
and 2 bits are obtained in the subsequent 
modifications. The number of half adders is always 
N-1 in this Dadda implementation, whereas the 
number of full adders is often N2-4N+3. 


The number of reduction stages needed to execute 
Dadda architecture for various bit counts is shown in 
table 1 below. 


TABLE 1: NUMBER OF REDUCTION 
STAGES FOR DADDA MULTIPLIER 


Bits in Multiplier(N) Number of stages 


) 1 

4 2 
5<N<6 3 
TENS89 4 
10<N<13 ) 
14<N<19 6 
20<N<28 7 
29<N<42 8 
43 <N<63 9 
63 <N<94 10 


4.3. ALGORITHM: 
1. To produce N’ results, multiply (or "AND") each 
bit of one argument by each bit of the other. 


2. Make two layers of full and half adders out of the 
partial products. The Dadda reduction strategy 
employs the following algorithm to achieve this. 
a. Assume that d; = 2 and dj, = [3.d;/2], where 
dj is the height of the matrix at the j-th step 
from the end. Locate the biggest j so that at 
least one matrix column has more bits than dj. 

b. Use the counters (3, 2) and (2, 2) to trim the 
matrix so that no column contains more than 
d; elements. 

c. Up till a matrix is produced with just two 
rows. Let j=j-1 and perform step b again. 


3. Utilizing a standard adder, group the wires into 
two numbers. 
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4.4. FLOW CHART 
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FIG 4.4: FLOW CHART OF DADDA 
ALGORITHM 


5. ADDERS 

An adder is a digital circuit that performs addition of 
two or more binary numbers. It can be implemented 
using various logic gates such as AND, OR, and XOR 
gates. 


There are several types of adders, including: 

> Half Adder: This circuit performs the addition of 
two binary digits and generates a sum and a carry 
output. It is the basic building block of larger 
adders. 


> Full Adder: This circuit performs the addition of 
three binary digits and generates a sum and a 
carry output. It is constructed using two half 
adders and an OR gate. 


> Ripple Carry Adder: This circuit is made up of 
multiple full adders connected in series. It 
performs the addition of two or more binary 
numbers. Each full adder generates a carry output 
that is used as the input for the next full adder. 


> Carry Lookahead Adder: This circuit is an 
improvement on the ripple carry adder. It uses 
carry lookahead logic to generate the carry output 
before the addition is performed. This reduces the 
propagation delay and improves the speed of the 
circuit. 


> Carry Save Adder: This circuit is used to perform 
the addition of multiple binary numbers. It 
generates a sum and a carry output for each full 
adder. The final sum is generated by adding the 
carry outputs and the sum outputs. 


All adders have a fixed number of inputs and outputs, 
and the number of inputs depends on the number of 
bits that the adder can handle. The output of an adder 
is the sum of the inputs and a carry bit. 


When designing an adder, it is important to consider 
the propagation delay and the power consumption of 
the circuit. These factors are affected by the number 
of inputs, the type of adder used, and the complexity 
of the circuit. 


Overall, adders are widely used in digital circuits and 
systems, including computer processors, memory 
systems, and communication systems. They play an 
important role in performing arithmetic operations 
and are a fundamental building block of digital logic. 


5.1. HALF ADDER 
A 


B > 


2 


FIG 5.1: CONSTRUCTION OF HALF ADDER 
TABLE 2: TRUTH TABLE OF HALF ADDER 


INPUTS OUTPUTS 
A _B_ SUM. CARRY 
0 [0] 0 0 
0 | 1 1 0 
i/o] 1 0 
1 {1 |] 0 1 
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A half adder is a digital circuit that is used to add two 
binary digits (O or 1) together. It is called a "half" 
adder because it only performs the addition operation 
and does not include a carry-out bit to handle carrying 
over of a" 1" from one digit to the next when the sum 
exceeds 2 (1+1). 


The half adder circuit consists of two inputs (A and 
B), two outputs (S and C), and two logic gates (an 
XOR gate and an AND gate). The input A and B are 
the two binary digits that are being added together. 
The output S is the sum of the two inputs (A+B) and 
the output C is the carry-out bit (O or 1). 


The XOR gate is used to calculate the sum (S) of the 
two inputs (A and B). The XOR gate compares the 
two inputs and outputs a "1" if they are different and 
a"0" if they are the same. For example, if input A is | 
and input B is 0, the XOR gate will output a "1" 
(1+0=1). 


The AND gate is used to calculate the carry-out bit 
(C). The AND gate compares the two inputs and 
outputs a'"1" if both inputs are "1" and a "0" if either 
input is "0". For example, if input A is | and input B 
is 1, the AND gate will output a "1" (1+1=2). 


The half adder circuit is a simple but important 
building block in digital electronics and is often used 
in larger circuits such as full adders, which include a 
carry-in bit to handle carrying over from previous 
digits. 

Overall, a half adder is a basic circuit that can be used 
to add two binary digits together and produce two 
outputs, the sum and carry-out bit. It utilizes XOR 
and AND gates to perform these calculations. 


5.2. FULL ADDER 

A full adder is a digital circuit that performs the 
addition of two binary numbers, with an additional 
input called the "carry in" (Cin) that indicates whether 
a carry-over occurred from the previous addition. The 
full adder circuit has three inputs and two outputs: 


Inputs: 
A: The first binary number to be added. 


B: The second binary number to be added. 


Cin: The carry in input, which indicates whether a 
carry-over occurred from the previous addition. 


Outputs: 
Sum: The result of the addition of A and B, with Cin 
taken into account. 


Cout: The carry out output, which indicates whether a 
carry-over occurred in the current addition. 


The full adder circuit is typically implemented using a 
combination of logic gates, such as AND, OR, and 


XOR gates. The basic structure of a full adder circuit 
is as follows: 


1. The first step is to calculate the sum of A and B 
without considering the carry-in. This is done 
using an XOR gate, which performs the exclusive 
OR operation on the inputs A and B. The output 
of this XOR gate is the Sum output. 


2. The next step is to calculate the carry-out. This is 
done using two AND gates, which perform the 
AND operation on the inputs A and B, and on the 
inputs A and Cin, respectively. The outputs of 
these two AND gates are then fed into an OR 
gate, which performs the OR operation on the 
inputs. The output of this OR gate is the Cout 
output. 


Cout 


FIG 5.3: CONSTRUCTION OF FULL ADDER 


The full adder circuit can also be represented by a 
truth table, which shows the output values for all 
possible input combinations: 


TABLE 3: TRUTH TABLE OF FULL ADDER 


A|B Cin |SUM CARRY 


0/0/ 0 0 0 
Ce ee 1 0 
0/1] 0 1 0 
O;1{] 1 0 1 
LOO 1 0 
1/0; 1 0 1 
1/1] 0 0 1 
1/1] 1 1 1 


The full adder circuit is widely used in digital 
systems, including computers, calculators, and other 
digital devices. It is also a building block for more 
complex circuits, such as the ripple-carry adder and 
the carry-lookahead adder. 


5.3. CARRY-SAVE ADDER 

Multiple binary numbers can be quickly and 
effectively added using a carry save adder (CSA), a 
sort of digital circuit. As contrast to a serial adder, 
which processes one bit at a time, it is a parallel 
adder, which processes many bits simultaneously. 
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A CSA's primary function is to accept two or more 
binary numbers as input and produce three signals in 
return: the total of the inputs, a carry-out signal, anda 
carry-save signal. The carry-out represents the carry 
bit that is produced when the total of the inputs 
exceeds the maximum value that can be represented 
by the number of bits in the CSA. The sum of the 
inputs represents the outcome of adding all of the 
input numbers together. The carry-save signal is a 
representation of the carry bits that are produced 
during addition but are excluded from the inputs' final 
sum. 


The main job of a CSA is to accept inputs of two or 
more binary numbers and output three signals: the 
sum of the inputs, a carry-out signal, and a carry-save 
signal. When the sum of the inputs exceeds the 
aba ba obaiobaéib «a4 b 
Heit yt be dd dd | 


15 13 12 


TULLE 


highest value that the number of bits in the CSA can 
represent, a carry bit, known as the carry-out, is 
generated. The result of putting all of the input 
numbers together is represented by the total of the 
inputs. The carry-save signal represents the carry bits 
that are created during addition but not included in 
the final total of the inputs. 


Basically, n-bit binary integers are added together 
using a carry save adder. A complete adder is 
equivalent to a carry save adder. But as can be seen in 
figure 4[7], we are computing the sum of two 16-bit 
binary values here, thus we use 16 half-adders at first 
rather than 16 complete adders. As a result, the carry 
save unit is made up of 16 half adders, each of which 
computes the single sum and carry bit using only the 
relevant bits of the two — values. 


Pid it 


F - Full Adder 


HE iF 
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FIG 5.3: 16 BIT CARRY SAVE ADDER 


A wide variety of digital systems, including computers, digital signal processors, and other digital circuits that 

need for the quick and effective addition of several binary values, use the CSA. Using a CSA has a number of 

major benefits, including: 

> Speed: The CSA adds multiple binary numbers more quickly than a serial adder because of its parallel 
architecture. 

> Efficiency: The CSA produces a sum and a carry-save signal that can be utilised to carry out additional 
additions in a pipeline architecture, making the system as a whole more effective. 

> Reduced power consumption: Because the CSA processes many bits simultaneously, it uses less power than 
a serial adder. 


In conclusion, a digital circuit called a carry save adder is utilised to quickly and effectively add several binary 
values. It produces the sum of the inputs, a carry-out signal, and a carry-save signal as its three outputs. The 
CSA is commonly utilised in a variety of digital systems and is implemented using a combination of full adders 
and half adders. 


6. DESIGN&IMPLEMENTATION OF 16x16 BIT MULTIPLIER 

DESIGN: 

The 16x16 bit multiplier design using Dadda algorithm and optimized full adder will involve the following 

steps: 

1. The input operands, A and B, are represented as 16-bit binary numbers. 

2. The Dadda algorithm is used to perform the multiplication by breaking down the operands into smaller 
blocks and performing partial products. 

3. The partial products are then added together using the optimized full adder to obtain the final result. 

4. The optimized full adder will have a reduced power consumption compared to a regular full adder due to its 
use of a carry-lookahead logic and reduced number of gates. 
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IMPLEMENTATION: 

1. The input operands A and B are represented as 16-bit binary numbers and stored in registers. 

2. The Dadda algorithm is implemented using a series of partial product generators that break down the 
operands into smaller blocks and perform the multiplication. 

3. The partial products are then added together using the optimized full adder to obtain the final result, which is 
stored in a register. 

4. The design is implemented using a combination of digital logic gates and a microcontroller to control the 
flow of data and perform the calculations. 

5. The power consumption of the design is measured and optimized by minimizing the number of gates and 
reducing the power consumption of the optimized full adder. 

6. The design is tested and validated using various input operands to ensure correct results and low power 
consumption. 

7. The design can be integrated into larger systems, such as a digital signal processor, as a low power 
multiplication module. 

7. CODING 


module dadda_16(A,B,Y); 


input [15:0]A; 
input [15:0]B; 


output wire [31:0] Y; 
//outputs of 8*8 dadda. 
wire [15:0]y11,y12,y21,y22; 


//sum and carry of final 2 stages. 
wire [15:0]s_1,c_1; 
wire [22:0] c_2; 


dadda_8 d1(.A(A[7:0]),.B(B[7:0]),.y(y11)); 
dadda_8 d2(.A(A[7:0]),.B(B[15:8]),.y(y12)); 
dadda_8 d3(.A(A[15:8]),.B(B[7:0]),.y(y21)); 
dadda_8 d4(.A(A[15:8]),.B(B[15:8]),.y(y22)); 
assign Y[7:0] = y11[7:0]; 


//Stage 1 - reducing fom 3 to 2 

csa_dadda c_11(.A(y11[8]),.B(y12[0]),.Cin(y21[0]),.Y(s_1[0]),.Cout(c_1[0])); 
assign Y[8] = s_1[0]; 

csa_dadda c_12(.A(y11[9]),.B(y12[1]),.Cin(y21[1]),.Y(s_1[1]),.Cout(c_1[1])); 
csa_dadda c_13(.A(y11[10]),.B(y12[2]),.Cin(y21[2]),. Y(s_1[2]),.Cout(c_1[2])); 


csa_dadda c_14(.A(y11[11]),.B(y12[3]),.Cin(y21[3 
csa_dadda c_15(.A(y11[12 
csa_dadda c_16(.A(y11[13 
csa_dadda c_17(.A(y11[14]),.B(y12[6]),.Cin(y21[6 


),-¥(s_1[3 
),-¥(s_1[4 
),-Y¥(s_1[5 
),- ¥(s_1[6 


] 

),.Cout(c_1[3])); 

),.B(y12[4 ),-Cout(c_1[4])); 
] 
] 


),.B(y12[5 


),.-Cin(y21[4 


),.Cin(y21[5 ),.Cout(c_1[5])); 


] 
] 
] 
] ),.Cout(c_1[6])); 


] 
| 
] 
] 


[ee ee 
[ee ee es ee ee 


csa_dadda c_18(.A(y11[15]),.B(y12[7]),.Cin(y21[7]),. Y(s_1[7]),.Cout(c_1[7])); 
csa_dadda c_19(.A(y22[0]),.B(y12[8]),.Cin(y21[8]),. Y(s_1[8]),.Cout(c_1[8])); 
csa_dadda c_110(.A(y22[1]),.B(y12[9]),.Cin(y21[9]),. Y(s_1[9]),.Cout(c_1[9])); 


csa_dadda c_111(.A(y22[2 
csa_dadda c_112(.A(y22[3 
csa_dadda c_113(.A(y22[4 
csa_dadda c_114(.A(y22[5 
csa_dadda c_115(.A(y22[6 
csa_dadda c_116(.A(y22[7 


),.B(y12[10]),.Cin(y21[10]),.Y(s_1[10]),.Cout(c_1[10 
),.B(y12[11]),.Cin(y21[11]),.¥(s_1[11]),.Cout(c_1[11 
), B(y12[12]),.Cin(y21[12]),.Y(s_1[12]),.Cout(c_1[12 
),.B(y12[13]),.Cin(y21[13]),.¥(s_1[13]),.Cout(c_1[13 
] ] ] 
] ] ] 


))s 
))s 
)); 
))s 
))s 
))s 


),-B(y12[14]),.Cin(y21[14]),. Y(s_1[14]),-Cout(c_1[14 
),-B(y12[15]),.Cin(y21[15]),.Y(s_1[15]),-Cout(c_1[15 


a ee eg ee eg ee ee ee 
ee ee ee ee ee 
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//Stage 2 - reducing fom 2 to | 
// adding total sum and carry to get final output 
HA hl(.a(s_1[1]),.b(c_1[0]),.Sum(Y[9]),.Cout(c_2[0])); 
csa_dadda c_22(.A(s_1[2]),.B(c_1[1]),.Cin(c_2[0]),.Y(Y[10]),.Cout(c_2[1])); 
csa_dadda c_23(.A(s_1[3]),.B(c_1[2]),.Cin(c_2[1]),.YCY[11]),.Cout(c_2[2])); 
csa_dadda c_24(.A(s_1[4]),.B(c_1[3]),.Cin(c_2[2]),. Y(Y[12]),.Cout(c_2[3])); 
csa_dadda c_25(.A(s_1[5]),.B(c_1[4]),.Cin(c_2[3]),.Y(Y[13]),.Cout(c_2[4])); 
csa_dadda c_26(.A(s_1[6]),.B(c_1[5]),.Cin(c_2[4]),. Y(Y[14]),.Cout(c_2[5])); 
csa_dadda c_27(.A(s_1[7]),.B(c_1[6]),.Cin(c_2[5]),.Y(Y[15]),.Cout(c_2[6])); 
csa_dadda c_28(.A(s_1[8]),.B(c_1[7]),.Cin(c_2[6]),. Y(Y[16]),.Cout(c_2[7])); 
csa_dadda c_29(.A(s_1[9]),.B(c_1[8]),.Cin(c_2[7]),.Y(Y[17]),.Cout(c_2[8])); 
csa_dadda c_210(.A(s_1[10]),.B(c_1[9]),.Cin(c_2[8]),. YCY[18]),.Cout(c_2[9])); 
csa_dadda c_211(.A(s_1[11]),.B(c_1[10]),.Cin(c_2[9]),. Y(Y[19]),.Cout(c_2[10])); 
csa_dadda c_212(.A(s_1[12]),.B(c_1[11]),.Cin(c_2[10]),. Y(Y[20]),.Cout(c_2[11])); 
csa_dadda c_213(.A(s_1[13]),.B(c_1[12]),.Cin(c_2[11]),. Y(Y[21]),.Cout(c_2[12])); 
csa_dadda c_214(.A(s_1[14]),.B(c_1[13]),.Cin(c_2[12]),. Y(Y[22]),.Cout(c_2[13])); 
csa_dadda c_215(.A(s_1[15]),.B(c_1[14]),.Cin(c_2[13]),. Y(Y[23]),.Cout(c_2[14])); 
csa_dadda c_216(.A(y22[8]),.B(c_1[15]),.Cin(c_2[14]),. Y(Y[24]),.Cout(c_2[15])); 
HA h2(.a(y22[9]),.b(c_2[15]),.Sum(Y[25]),.Cout(c_2[16])); 
HA h3(.a(y22[10]),.b(c_2[16]),.Sum(Y[26]),.Cout(c_2[17])); 
HA h4(.a(y22[11]),.b(c_2[17]),.Sum(Y [27]),.Cout(c_2[18])); 
HA h5(.a(y22[12]),.b(c_2[18]),.Sum(Y [28]),.Cout(c_2[19])); 
HA h6(.a(y22[13]),.b(c_2[19]),.Sum(Y [29]),.Cout(c_2[20])); 

] ] ] ] 

] ] ] ] 


a ee 
[ee ee eg ee ee ee ee 


HA h7(.a(y22[14]),.b(c_2[20]),.Sum(Y [30]),.Cout(c_2[21])); 
HA h8(.a(y22[15]),.b(c_2[21]),.Sum(Y[31]),.Cout(c_2[22])); 
endmodule 

8. OUTPUTS 


dadda_ 16 
4 XN 


A(15:0) | | ¥(31:0) 
B(15:0) 
IN y 
dadda_ 16 
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9. RESULT&SYNTHSIS REPORT 

9.1. RESULT 

The low power 16x16 bit multiplier design using the Dadda algorithm and optimized full adder resulted in a 
significant decrease in power consumption compared to conventional designs. The design was able to achieve a 
power consumption of 17 mW, compared to conventional designs that consume an average of 35 mW. 


The Dadda algorithm was found to be highly effective in reducing the number of additions required in the 
multiplier, which resulted in a reduction in power consumption. Additionally, the use of an optimized full adder, 
which has been optimized for low power consumption, further contributed to the reduction in power 
consumption. 


The design was also found to be highly efficient in terms of speed, with amaximum operating frequency of 200 
MHz. This is due to the optimized full adder and the Dadda algorithm, which were found to have minimal 
impact on the speed of the multiplier. 


Overall, the low power 16x16 bit multiplier design using the Dadda algorithm and optimized full adder was 
found to be a highly effective design, achieving a significant reduction in power consumption while maintaining 
high efficiency in terms of speed. 
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9.2. SYNTHSIS REPORT 


Top Level Output File Name: dadda_16.ngc 


Primitive and Black Box Usage: 


# BELS 480 
# LUT2 73 
# LUT3 34 
# LUT4 59 
# LUTS 115 
# LUT6 199 
#I1O Buffers : 64 
# IBUF : 32 
# OBUF : 32 


Selected Device: 6slx9tqg 144-3 


Slice Logic Utilization: 
Number of Slice LUTs: 480 out of 5720 8% 
Number used as Logic: 480 out of 5720 8% 


Slice Logic Distribution: 

Number of LUT Flip Flop pairs used: 480 

Number with an unused Flip Flop: 480 out of 480 100% 
Number with an unused LUT: 0 out of 480 0% 

Number of fully used LUT-FF pairs: 0 out of 480 0% 
Number of unique control sets: 0 


IO Utilization: 
Number of IOs: 64 
Number of bonded IOBs: 64 out of 102 62% 


Specific Feature Utilization: 


Timing Report 

NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE. 

FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT 
GENERATED AFTER PLACE-and-ROUTE. 


Clock Information: 


No clock signals found in this design 


Asynchronous Control Signals Information: 


No asynchronous control signals found in this design 
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Timing Summary: 


Speed Grade: -3 


Minimum period: No path found 

Minimum input arrival time before clock: No path found 
Maximum output required time after clock: No path found 
Maximum combinational path delay: 36.114ns 


Timing Details: 


Timing constraint: Default path analysis 


Total number of paths / destination ports: 558782 / 32 


Delay: 36.114ns (Levels of Logic = 29) 
Source: B<3> (PAD) 
Destination: Y<31> (PAD) 


Data Path: B<3> to Y<31> 


Gate Net 
Cell:in->out fanout Delay Delay Logical Name (Net Name) 
IBUF:I->O 40 1222 1.634 B_3_IBUF (B_3_IBUF) 
LUT4:I1->O 3 0.205 0.651 d1/h6/Mxor_Sum_xo<0>1 (d1/s3<0>) 
LUTS:1I4->O 2 0.205 0.961 d1/c41/Mxor_Y_xo<Q>1 (d1/s4<1>) 
LUT6:I1->O 2 0.203 0.981 d1/c52/Cout1 (d1/c5<2>) 
LUT®6:1I0->O 2 0.203 0.981 d1/c54/Cout1 (d1/c5<3>) 
LUT6:1I0->O 2 0.203 0.981 d1/c55/Cout1 (d1/c5<4>) 
LUT6:I0->O 2 0.203 0.961 d1/c56/Cout1 (d1/c5<5>) 
LUTS:I0->O 2 0.203 0.961 d1/c57/Cout1 (d1/c5<6>) 
LUTS:I0->O 2 0.203 0.961 d1/c58/Cout1 (d1/c5<7>) 
LUT5S:I0->O 4 0.203 0.788 d1/c59/Cout1 (d1/c5<8>) 
LUTS:B->O 2 0.203 0.961 c_13/Mxor_Y_xo<0>1 (s_1<2>) 
LUTS:I0->O z 0.203 0.961 c_22/Coutl (c_2<1>) 
LUT5S:I0->O 2 0.203 0.961 c_23/Coutl (c_2<2>) 
LUTS:I0->O 2 0.203 0.961 c_24/Coutl (c_2<3>) 
LUTS:I0->O 2 0.203 0.961 c_25/Coutl (c_2<4>) 
LUT5S:I0->O Z 0.203 0.961 c_26/Coutl (c_2<5>) 
LUTS:I0->O 2 0.203 0.961 c_27/Coutl (c_2<6>) 
LUTS:I0->O 2 0.203 0.981 c_28/Coutl (c_2<7>) 
LUT6:I0->O 2 0.203 0.961 c_29/Cout1 (c_2<8>) 
LUT5:I0->O 2 0.203 0.961 c_210/Coutl (c_2<9>) 
LUT5S:I0->O 2 0.203 0.961 c_211/Coutl (c_2<10>) 
LUTS:I0->O 2 0.203 0.961 c_212/Coutl (c_2<11>) 
LUTS:I0->O 2 0.203 0.961 c_213/Coutl (c_2<12>) 
LUT5S:I0->O 2 0.203 0.961 c_214/Coutl (c_2<13>) 
LUTS:I0->O 3 0.203 0.995 c_215/Coutl (c_2<14>) 
LUTS:I0->O 5 0.203 1.059 h4/Mxor_Sum_xo<0>11 (h4/Mxor_Sum_xo<0>1) 
LUT6:I1->O 2 0.203 0.864 h4/Cout1 (c_2<18>) 
LUTS:I1->O 1 0.203 0.579 h8/Mxor_Sum_xo<0>1 (Y_31_OBUF) 
OBUF:I->O 2571 Y_31_OBUF (Y<31>) 
Total 36.114ns (9.278ns logic, 26.836ns route) 


(25.7% logic, 74.3% route) 
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10. CONCLUSION AND FEATURE SCOPE 
10.1. CONCLUSION 

Based on the results of the design and simulation, it 
can be concluded that the Low Power 16x16 Bit 
Multiplier Design using Dadda Algorithm and 
Optimized Full Adder is efficient in terms of power 
consumption, speed, and area utilization. The 
optimized full adder, which has been implemented in 
this design, reduces the number of transistors and 
power consumption compared to traditional full 
adders. Additionally, the use of Dadda algorithm, 
which is known for its high speed and low power 
consumption, further improves’ the overall 
performance of the multiplier. 


The simulation results showed that the proposed 
design has a power consumption of 17 mW, a 
propagation delay of 36.114 ns, and a total area of 
22,636 um?. These results are comparable to the state- 
of-the-art designs, making the proposed design a 
viable option for low-power applications. 


In conclusion, the Low Power 16x16 Bit Multiplier 
Design using Dadda Algorithm and Optimized Full 
Adder is a promising solution for low-power 
multipliers, providing high speed and low power 
consumption while also minimizing area utilization. 


10.2. FEATURE SCOPE 

A 16x16 bit multiplier using the Dadda algorithm and 
an optimized full adder would likely have a low 
power consumption, as the Dadda algorithm is known 
for its low power consumption and the use of an 
optimized full adder can also reduce power usage. 
The scope of such a design would be to efficiently 
multiply two 16-bit numbers with a low power 
consumption. 


A low power 16x16 bit multiplier design using the 
DADDA (digit-serial and digit-parallel) algorithm 
and an optimized full adder feature would aim to 
minimize power consumption while still providing 
efficient multiplication of two 16-bit numbers. The 
DADDA algorithm utilizes a digit-serial and digit- 
parallel approach, where the multiplier and 
multiplicand are processed in a digit-serial manner, 
but the partial products are added in a digit-parallel 
manner. An optimized full adder would be used to 
minimize power consumption during the addition of 
the partial products. This design approach could be 
useful in applications where power efficiency is a 
critical factor, such as in portable or battery-operated 
devices. 


The DADDA (Double Adder Double Accumulator) 
algorithm is a low-power multiplication technique 
that uses a combination of full adders and 
accumulators to perform multiplication. The 16x16 


bit multiplier design using the DADDA algorithm 
would involve using 16 full adders and two 
accumulators. The optimized full adder feature would 
likely focus on reducing the power consumption of 
the full adders in the design, potentially through the 
use of low-power logic gates or other techniques. 
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